我们解决了联合学习(FL-HPO)的超参数优化(HPO)的相对未开发的问题。我们引入联邦损失表面聚合(Flora),该框架的第一个FL-HPO解决方案框架可以解决除了在流体文献中通常寻址的随机梯度下降/神经网络之外的表格数据和梯度提升训练算法的用例。该框架使单次FL-HPO能够首先识别**单次**培训中使用的良好的超参数集。因此,与没有HPO的FL训练相比,它使FL-HPO解决方案具有最小的额外通信开销。我们对七个OpenML数据集的梯度提升决策树Flora的实证评估表明,对所考虑的基线,以及越来越多的涉及FL-HPO培训的各方的鲁棒性,可以显着的模型准确性。
translated by 谷歌翻译
果蝇的嗅觉电路中的数学形式化是作为地区敏感散列(Flyhash)和绽放过滤器(FBF)的果蝇,并为各种机器学习任务(如相似度搜索,异常值检测)“重新编程”和文本嵌入。我们提出了一种新颖的对该散列的重新编程和盛开的过滤器,以模拟规范最近邻分类器(NNC)在具有挑战性的联邦学习(FL)设置,其中培训和测试数据跨各方传播,并且没有数据可以留下各自的各方。具体而言,我们利用Flyhash和FBF来创建Flynn分类器,以及理论上建立Flynn匹配NNC的条件。我们展示了Flynn如何在FL设置中培训,具有低通信开销,以产生FlynNFL,以及如何差异私密。经验上,我们证明(i)Flynn与70个OpenML数据集匹配NNC精度,(ii)Flynnfl训练具有低通信开销的高度可扩展,提供高达$ 16 $派对的$ 8 \倍。
translated by 谷歌翻译
我们提出了一种新的计算上高效的多阶算法,用于模型 - 不可知的元学习(MAML)。关键启用技术是将MAML解释为BileVel优化(BLO)问题,并将基于符号的SGD(Signsgd)作为BLO的较低级优化器利用。我们表明MAML通过面向标志的镜头,自然地产生交替的优化方案,只需要学习的元模型的一阶梯度。我们术语由此产生的MAML算法标志MAML。与传统的一阶MAML(FO-MAML)算法相比,标志MAML理论上是接地的,因为在元训练期间没有对没有二阶导数的任何假设。在实践中,我们表明,符号MAML在各种几次拍摄图像分类任务中优于FO-MAML,并与MAML相比,它在分类准确性和计算效率之间实现了更加优雅的权衡。
translated by 谷歌翻译
本文研究了在连续学习框架中使用分类网络的固定架构培训深度学习模型的优化算法的新设计。训练数据是非平稳的,非平稳性是由一系列不同的任务施加的。我们首先分析了一个仅在隔离的学习任务的深层模型,并在网络参数空间中识别一个区域,其中模型性能接近恢复的最佳。我们提供的经验证据表明该区域类似于沿收敛方向扩展的锥体。我们研究了融合后优化器轨迹的主要方向,并表明沿着一些顶级主要方向旅行可以迅速将参数带到锥体之外,但其余方向并非如此。我们认为,当参数被限制以保持在训练过程中迄今为止遇到的单个任务的相交中,可以缓解持续学习环境中的灾难性遗忘。基于此观察结果,我们介绍了我们的方向约束优化(DCO)方法,在每个任务中,我们引入一个线性自动编码器以近似其相应的顶部禁止主要方向。然后将它们以正规化术语的形式合并到损失函数中,以便在不忘记的情况下学习即将到来的任务。此外,为了随着任务数量的增加而控制内存的增长,我们提出了一种称为压缩DCO(DCO-comp)的算法的内存效率版本,该版本为存储所有自动编码器的固定大小分配了存储器。我们从经验上证明,与其他基于最新正规化的持续学习方法相比,我们的算法表现出色。
translated by 谷歌翻译
对抗性扰动对于证明深度学习模型的鲁棒性至关重要。通用的对抗扰动(UAP)可以同时攻击多个图像,因此提供了更统一的威胁模型,从而避免了图像攻击算法。但是,当从不同的图像源绘制图像时(例如,具有不同的图像分辨率)时,现有的UAP生成器不发达。在图像来源的真实普遍性方面,我们将UAP生成的新颖看法是一个定制的几个实例,它利用双杆优化和学习优化的(L2O)技术(L2O)技术,以提高攻击成功率(ASR)(ASR) )。我们首先考虑流行模型不可知的元学习(MAML)框架,以将UAP生成器元素进行。但是,我们看到MAML框架并未直接提供跨图像源的通用攻击,从而要求我们将其与L2O的另一个元学习框架集成在一起。元学习UAP发电机(i)的最终方案的性能(ASR高50%)比预计梯度下降等基线的方案(II)比香草L2O和MAML框架的性能更好(37%)(当适用),(iii)能够同时处理不同受害者模型和图像数据源的UAP生成。
translated by 谷歌翻译
We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is a polynomial factor approximation to the this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal. Our work thus establishes fundamental lower and upper bounds on measuring distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.
translated by 谷歌翻译
We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests--based on a collection of losses and hypothesis class--a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature's true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses. This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape.
translated by 谷歌翻译
作为算法公平性的概念,多核算已被证明是一个强大而多才多艺的概念,其含义远远超出了其最初的意图。这个严格的概念 - 预测在丰富的相交子群中得到了很好的校准 - 以成本为代价提供了强大的保证:学习成型预测指标的计算和样本复杂性很高,并且随着类标签的数量而成倍增长。相比之下,可以更有效地实现多辅助性的放松概念,但是,仅假设单独使用多学历,就无法保证许多最可取的多核能概念。这种紧张局势提出了一个关键问题:我们能否以多核式式保证来学习预测因素,以与多审核级相称?在这项工作中,我们定义并启动了低度多核的研究。低度的多核净化定义了越来越强大的多组公平性概念的层次结构,这些概念跨越了多辅助性和极端的多核电的原始表述。我们的主要技术贡献表明,与公平性和准确性有关的多核算的关键特性实际上表现为低级性质。重要的是,我们表明,低度的数学振动可以比完整的多核电更有效。在多级设置中,实现低度多核的样品复杂性在完整的多核电上呈指数级(在类中)提高。我们的工作提供了令人信服的证据,表明低度多核能代表了一个最佳位置,将计算和样品效率配对,并提供了强大的公平性和准确性保证。
translated by 谷歌翻译
用于放牧牛的土地占据了美国土地的三分之一。这些区域可以非常坚固。然而,他们需要维持,以防止杂草接管营养草地。这可能是一种艰巨的任务,特别是在有机养殖的情况下,因为不能使用除草剂。在本文中,我们展示了Cowbot的设计,是一种牧场的自主杂草割草机。牛仔是一架电动割草机,旨在在牛牧场上的崎岖环境中运行,并为有机农场提供杂草控制的成本效益。由于牧场的杂草分布未知,牛仔队的路径规划是挑战性的。鉴于有限的视野,在线路径规划是必要的,以检测杂草和计划割草的路径。我们研究了具有曲率和视野约束的自主割草机的一般在线路径规划问题。我们开发两个在线路径规划算法,能够利用有关杂草的新信息来优化路径长度并确保覆盖范围。我们部署了在电流的电流和执行现场实验,以验证我们的实时路径规划方法的适用性。与基线Boustrophedon和基于随机搜索的覆盖路径相比,我们还执行广泛的仿真实验,表明我们的算法导致路径长度降低高达60%。
translated by 谷歌翻译
In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.
translated by 谷歌翻译